64 research outputs found

    La minería de datos, entre la estadística y la inteligencia artificial

    Get PDF
    En la pasada década hemos asistido a la irrupción de un nuevo concepto en el mundo empresarial: el data mining (minería de datos). Algunas empresas han implementado unidades de minería de datos estrechamente vinculadas a la dirección de la empresa y en los foros empresariales las sesiones dedicadas a la minería de datos han sido las protagonistas. La minería de datos se presenta como una disciplina nueva, ligada a la Inteligencia Artificial y diferenciada de la Estadística. Por otro lado, en el mundo estadístico más académico, la minería de datos ha sido considerada en su inicio como una moda más, aparecida después de los sistemas expertos, conocida desde hacía tiempo bajo el nombre de "data fishing". ¿Es esto realmente así? En este artículo abordaremos las raíces estadísticas de la minería de datos, los problemas que trata, haremos una panorámica sobre el alcance actual de la minería de datos, presentaremos un ejemplo de su aplicación en el mundo de la audiencia de televisión y, por último, daremos una visión de futuro

    Descripció i classificació de les comarques catalanes en regions homogènies segons l'ús de la terra

    Get PDF
    The theme of this article is the application of techniques of exploratory statistics to the study of comprehensive numerical tables consisting of statistics of a spatial nature. The immensity of statistics compiled over a large area, as in the case of a population census, frequently makes it difficult to assimilate all the information contained therein. It is shown that the mentioned techniques of analysis make possible a profound understanding of such statistics without resorting to the inspection of the said tables. The objectives usually pursued are: (1) to emphasize the most outstanding characteristics of the statistics, such as associations andlor contrasts in the elements under study, an objective which is easily fulfilled through methods of descriptive factorial analysis; (2) to group the basic elements of study into a limited number of representative classes, which can likewise be easily achieved through a simple algorithrn of ascendent hierarchical classification. The aplication of this method demonstrates the compatibility of the two results. This normally corresponds to the final stage in the study of statistical tables, in which observations relate to small areas points. The natural desire to make the classes obtained coincide with geographical regions made necessary the introduction of the content relationship within the algorithm of ascendent hierarchical classification. The application undertaken makes it possible to identify improvements in the interpretation of the classes obtained.Postprint (published version

    The Longitudinal nature of patent value and technological usefulness exploring PLS structural equation models

    Get PDF
    The purpose of this paper is to investigate the evolution of patent value and technological usefulness over time using longitudinal structural equation models. The variables are modeled as endogenous unobservable variables which depend on three exogenous constructs: the knowledge stock used by companies to create their inventions, the technological scope of the inventions and the international scope of protection. Two set-ups are explored. The rst longitudinal model includes time-dependent manifest variables and the second includes time-dependent unobservable variables. The structural equation models are estimated using Partial Least Squares Path Modelling. We showed that there is a trade-o between the exogenous latent variables and technological usefulness over time. This means that the former variables become less important and the latter more important as time passes.Preprin

    PRESISTANT: Learning based assistant for data pre-processing

    Get PDF
    Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

    Perfil profesional del ingeniero informático: diagnóstico basado en competencias

    Get PDF
    Las universidades deben formar los ingenieros que la sociedad necesita. Los planes de Estudios del EEES deben ser diseñados, por tanto, a partir de las competencias profesionales requeridas por la sociedad. Cada escuela, no obstante, tiene su propia idiosincrasia, y debe escoger las competencias que sus egresados poseerán al finalizar los estudios y diseñar su plan de estudios a partir de dichas competencias. La selección de las competencias definirá el perfil profesional de sus titulados, por lo que es preciso disponer de elementos objetivos que permitan realizar adecuadamente esta selección. En este artículo se presenta el resultado de las encuestas realizadas a varios cientos de profesionales y a un conjunto de alumnos y profesores de la Facultat d’Informàtica de Barcelona. Las encuestas muestran el grado de importancia que los profesionales dan a cada competencia, y por lo tanto definen un perfil profesional. También muestran cómo perciben su aprendizaje los profesores y los estudiantes.Peer Reviewe

    Disseny del Pla de Mostreig per l’estimació de la fracció de Residus Resta en la bossa tipus de Catalunya

    Get PDF
    Informe Final de la FASE 1 del Contracte Menor de Serveis efectuat per Barcelona Ecologia a la Universitat Politècnica de CatalunyaPreprin

    Intelligent assistance for data pre-processing

    Get PDF
    A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, a dataset needs to be pre-processed before being mined. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives. As a consequence, non-experienced users become overwhelmed with pre-processing alternatives. In this paper, we show that the problem can be addressed by automating the pre-processing with the support of meta-learning. To this end, we analyzed a wide range of data pre-processing techniques and a set of classification algorithms. For each classification algorithm that we consider and a given dataset, we are able to automatically suggest the transformations that improve the quality of the results of the algorithm on the dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Postprint (author's final draft

    On the effect of measurementmodel misspecification in PLS Path Modeling: the reflective case

    Get PDF
    The specification of a measurement model as reflective or formative is the object of a lively debate. Part of the existing literature focuses on measurement model misspecification. This means that a true model is assumed and the impact on the path coefficients of using a wrong model is investigated. The majority of these studies is restricted to Structural Equation Modeling (SEM). Regarding PLS-Path Modeling (PLS-PM), a few authors have carried out simulation studies to investigate the robustness of the estimates, but their focus is the comparison with SEM. The present paper discusses the misspecification problem in the PLSPM context from a novel perspective. First, a real application on Alumni Satisfaction will be used to verify whether different assumptions for the measurements models influence the results. Second, the results of a Monte-Carlo simulation study, in the reflective case, will help to bring some clarity on a complex problem that has not been sufficiently studied yet

    On the predictive power of meta-features in OpenML

    Get PDF
    The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., non-experienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.Peer ReviewedPostprint (published version

    Modelling with heterogeneity

    No full text
    We present in this paper a methodology to deal with heterogeneity in modelling when the sources are unknown. Although the approach is general we present it for the PLS-PM latent variable modelling. We call such approach PATHMOX. The idea behind PATHMOX is to build a path models tree having a binary decision tree look-alike structure with models for different segments in each of its nodes. The split criterion consists in an F statistic for comparing structural models based on testing the equality of the path coefficients. We emphasize the rationale of such approach and its limitations. Finally we present an application to an Alumni Satisfaction survey.Peer ReviewedPostprint (published version
    • …
    corecore